Search CORE

54 research outputs found

Linear pattern matching on sparse suffix trees

Author: Kolpakov Roman
Kucherov Gregory
Starikovskaya Tatiana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2011
Field of study

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse suffix tree} \cite{KU-96} with appropriately defined suffix links. Assuming, under the standard unit-cost RAM model, that a word can store up to

\log_{\sigma}n

characters (

\sigma

the alphabet size), our index takes

O(n/\log_{\sigma}n)

space, i.e. the same space as the packed string itself. The resulting pattern matching algorithm runs in time

O(m+r^2+r\cdot occ)

, where

m

is the length of the pattern,

r

is the actual number of characters stored in a word and

occ

is the number of pattern occurrences

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

On the number of Dejean words over alphabets of 5, 6, 7, 8, 9 and 10 letters

Author: Kolpakov Roman
Rao Michael
Publication venue
Publication date: 16/05/2011
Field of study

We give lower bounds on the growth rate of Dejean words, i.e. minimally repetitive words, over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10. Put together with the known upper bounds, we estimate these growth rates with the precision of 0,005. As an consequence, we establish the exponential growth of the number of Dejean words over a k-letter alphabet, for k=5, 6, 7, 8, 9, 10.Comment: 13 page

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Finding approximate repetitions under Hamming distance

Author: Kolpakov Roman
Kucherov Gregory
Publication venue: HAL CCSD
Publication date: 01/01/2001
Field of study

The problem of computing tandem repetitions with

K

possible mismatches is studied. Two main definitions are considered, and for both of them an

O(nK\log K+S)

algorithm is proposed (

S

the size of the output). This improves, in particular, the bound obtained in \citeLS93. Finally, other possible definions are briefly analyzed.

CiteSeerX

INRIA a CCSD electronic archive server

On the sum of exponents of maximal repetitions in a word

Author: Kolpakov Roman
Kucherov Gregory
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

Rapport interne.This paper continues the study presented in {KolpakovKucherovRI98}, where it was proved that the number of maximal repetitions in a word is linearly-bounded in the word length. Here we strengthen this result and prove that the sum of exponents of maximal repetitions is linearly-bounded too. Similarly to {KolpakovKucherovRI98}, we first estimate the sum of exponents of maximal repetitions in Fibonacci words. Then we prove that the sum of exponents of all maximal repetitions in general words is linearly-bounded. Finally, some algorithmic applications of this results are discussed

INRIA a CCSD electronic archive server

On repetition-free binary words of minimal density

Author: Kolpakov Roman
Kucherov Gregory
Tarannikov Yuri
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1998
Field of study

Colloque avec actes et comité de lecture.In \cite{KolpakovKucherovMFCS97}, a notion of minimal proportion (density) of one letter in

n

-th power-free binary words has been introduced and some of its properties have been proved. In this paper, we proceed with this study and substantially extend some of these results. First, we introduce and analyse a general notion of minimal letter density for any infinite set of words which don't contain a specified set of ``prohibited'' subwords. We then prove that for

n

-th power-free binary words, the density function is

\frac{1}{n}+\frac{1}{n^3}+\frac{1}{n^4}+ {\cal O}(\frac{1}{n^5})

refining the estimate from \cite{KolpakovKucherovMFCS97}. Following \cite{KolpakovKucherovMFCS97}, we also consider a natural generalization of

n

-th power-free words to

x

-th power-free words for real argument

x

. We prove that the minimal proportion of one letter in

x

-th power-free binary words, considered as a function of

x

, is discontinuous at all integer points

n\geq 3

. Finally, we give an estimate of the size of the jumps

INRIA a CCSD electronic archive server